AD-A236  856 


Are  All  Linear  Paired  Comparison  Models  Equivalent? 


Hal  Stern 

Department  of  Statistics 
Harvard  University 


a  „•  -j  _ 


'%  °o  ° 

I  \\ 


-■•3  ;  .j* 


Technical  Report  No.  ONR-C-5  .  b  „ 


-/l  ■ ' ' :  '  r  n.yft 


September,  1990 


Reproduction  in  whole  or  in  part  is  permitted  for  any 
purpose  of  the  United  States  Government 

This  document  has  been  approved  for  public  release  and 
sale,  its  distribution  is  unlimited. 


91-01901 

JiUlMIllllIII  lllll  ill  mi 


91  6  ll  179 


Are  All  Linear  Paired  Comparison  Models  Equivalent? 

Hal  Stern 

Department  of  Statistics 
Harvard  University 
Cambridge,  MA  02138  U.S.A. 

ABSTRACT 

Previous  authors  (Jackson  and  Fleckenstein  1957,  Mosteller  1958,  Noether 
1960)  have  found  that  different  models  of  paired  comparisons  data  lead  to  simi¬ 
lar  fits.  This  phenomenon  is  examined  by  means  of  a  set  of  paired  comparison 
models,  based  on  gamma  random  variables,  that  includes  the  frequently  applied 
Bradley-Terry  and  Thurstone-Mosteller  models.  A  theoretical  result  provides  a 
natural  ordering  of  the  models  in  the  gamma  family  on  the  basis  of  their  compo¬ 
sition  rules.  Analysis  of  several  sports  data  sets  indicates  that  all  of  the  paired 
comparison  models  in  the  family  provide  adequate,  and  almost  identical,  fits  to 
the  data.  Simulations  are  used  to  further  explore  this  result.  Although  not  all 
approaches  to  paired  comparisons  experiments  are  covered  by  this  discussion,  the 
evidence  is  strong  that  for  samples  of  the  size  usually  encountered  in  practice  all 
linear  paired  comparison  models  are  virtually  equivalent. 

Abbreviated  Title:  Comparing  Paired  Comparison  Models 
Keywords:  Bradley-Terry  Model,  Thurstone-Mosteller  Model 
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1.  INTRODUCTION 


In  a  paired  comparisons  experiment,  k  objects  are  compared  in  blocks  of  size 
two.  Each  comparison  of  two  objects  has  two  possible  outcomes:  either  t  is  preferred 
to  j  or  j  is  preferred  to  i.  Successive  comparisons  of  a  pair  of  objects  are  assumed 
to  be  independent.  In  addition,  comparisons  of  distinct  pairs  of  objects  are  assumed 
to  be  independent  of  each  other.  This  eliminates  the  notion  of  a  single  judge  who 
compares  each  of  the  (*)  distinct  pairs,  as  the  comparisons  in  this  case  would  almost 
certainly  not  be  independent.  A  variety  of  models  exist  for  the  analysis  of  data 
from  paired  comparisons  experiments,  including  the  Bradley-Terry  model  (Bradley 
and  Terry  1952)  and  the  Thurstone-Mosteller  model  (Thurstone  1927,  Mosteller 
1951).  Jackson  and  Fleckenstein  (1957)  and  Mosteller  (1958)  illustrate  that  these 
two  models,  as  well  as  several  others,  provide  similar  fits  to  a  data  set. 

A  family  of  paired  comparison  models  based  on  gamma  random  variables  (Stern 
1990)  provides  a  framework  for  further  consideration  of  the  similarity  of  paired 
comparison  models.  The  gamma  paired  comparison  models  are  a  subset  of  the 
class  of  linear  models  (David  1988)  that  includes  the  Bradley-Terry  and  Thurstone- 
Mosteller  models.  The  probability  that  i  is  preferred  to  j  in  a  gamma  paired  com¬ 
parison  model  with  shape  parameter  r  is  equal  to  the  probability  that  one  gamma 
random  variable  with  shape  parameter  r  is  smaller  than  a  second  gamma  random 
variable,  independent  of  the  first,  with  the  same  shape  parameter  but  different  scale 
parameter.  This  model  is  appropriate,  for  example,  if  we  compare  the  waiting  time 
until  r  events  occur  in  each  of  two  independent  Poisson  processes  with  different 
rates.  The  Bradley-Terry  model  is  obtained  by  choosing  r  =  1  and  the  Thurstone- 
Mosteller  model  is  obtained  as  r  — >  oo.  In  these  cases,  equivalence  to  the  usual 
stochastic  utility  model  is  obtained  by  considering  a  logarithmic  transformation  of 
the  gamma  random  variables  (Stern  1990). 
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Evidence  from  three  different  sources  indicates  that,  for  typical  sample  sizes, 
the  choice  of  a  particular  paired  comparison  model  from  among  the  set  of  gamma 
models  seems  to  have  a  small  effect  on  the  results  obtained.  In  this  paper,  anal¬ 
ysis  of  several  sports  data  sets  indicates  that  almost  identical  fits  are  obtained  by 
several  models.  Close  consideration  of  the  case  with  k  =  3  objects  provides  some 
information  about  the  source  of  the  problem  and  provides  an  estimate  of  the  sam¬ 
ple  size  required  to  distinguish  between  paired  comparison  models.  Finally,  some 
simulations  generalize  the  results  to  larger  experiments. 

In  the  next  three  sections,  a  variety  of  paired  comparison  models  are  discussed. 
The  evidence  concerning  the  question  in  the  title  of  the  paper  is  presented  in  Section 
5. 


2.  PAIRED  COMPARISON  MODELS 

The  natural  parameter  in  a  paired  comparisons  experiment  is  piy ,  the  probabil¬ 
ity  that  t  is  preferred  to  j.  Probability  models  for  paired  comparisons  experiments 
attempt  to  provide  a  concise  description  of  the  preference  probabilities  p,y,  *  ^  j. 
Sophisticated  models  have  been  developed  to  account  for  the  possibility  of  ties, 
covariates  and  order  effects.  For  the  purposes  of  this  discussion,  only  paired  com¬ 
parison  models  that  ignore  ties,  order  effects  and  covariates  are  considered.  By 
assumption,  the  preference  probability  p,y  remains  constant  throughout  the  exper¬ 
iment.  The  saturated  model  for  a  paired  comparisons  experiment  with  k  objects 
associates  a  parameter  p<3  with  the  pair  of  objects  i  and  j,  thus  using  k(k  —  l)/2 
parameters.  A  more  parsimonious  model  assigns  a  parameter  A,  to  each  object  and 
takes  p^  =  P(Xi,Xj)  for  some  function  P(-,-).  This  type  of  model  uses  only  k 
parameters.  The  Bradley-Terry  and  Thurstone-Mosteller  models  are  examples  of 
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this  type.  These  models  are  now  considered  in  more  detail,  leading  to  a  family  of 
paired  comparison  models  used  throughout  this  study. 

The  Bradley-Terry  probability  model  assumes  the  probability  that  *  is  preferred 
to  j  can  be  written  as 

(fir)  _ 

Pi>  ~  X  i  +  X,  ‘ 

Over  time  this  expression  has  been  derived  in  many  ways  including  a  derivation 
based  on  Luce’s  (1959)  Choice  Axiom  and  one  based  on  maximum  entropy  (Joe 
1987).  Two  motivations  that  are  central  to  this  paper  are  the  gamma  random 
variable  motivation  (Stern  1990)  and  the  linear  model  derivation  (David  1988,  Latta 
1979).  Throughout  the  paper,  paired  comparisons  experiments  are  discussed  using 
the  terminology  of  a  sports  competition  because  the  data  in  Section  5  is  of  this 
form.  Suppose  that  team  t  scores  points  according  to  a  Poisson  process  with  rate  A, 
and  team  j  scores  points  according  to  a  Poisson  process  with  rate  Ay .  Furthermore, 
suppose  the  two  Poisson  processes  are  independent.  The  waiting  time  for  a  point 
to  be  scored  in  either  process  is  an  exponential  random  variable,  or  equivalently, 
a  gamma  random  variable  with  shape  parameter  1.  Then,  the  probability  that 
team  i  scores  one  point  before  team  j  is  the  probability  that  ~  r(l,A<)  (X< 
a  gamma  random  variable  with  shape  parameter  1  and  scale  parameter  A< )  is  less 
than  Xj  ~  T(l,Ay)  for  independent  random  variables  X, ,  X, .  This  probability  is 
the  Bradley-Terry  preference  probability  (Bradley  and  Terry  1952).  Holman  and 
Marley  derived  the  Bradley-Terry  model  in  terms  of  exponential  random  variables 
(equivalent  to  the  above  formulation)  in  1962  (see  Luce  and  Suppes  1965). 

Other  gamma  paired  comparison  models  are  obtained  by  comparing  gamma 
random  variables  with  shape  parameters  other  than  one.  The  point  scoring  moti¬ 
vation  suggests  models  with  integer-valued  shape  parameter,  but  gamma  random 
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variables  are  defined  for  any  shape  parameter  r  greater  than  zero.  Suppose  that 
Gx  (r)  is  a  stochastic  process  with  independent  increments  having  the  gamma  distri¬ 
bution,  so  that  Gx  (r2)  —  Gx  (r{ )  has  the  gamma  distribution  with  shape  parameter 
r2  —  r1  and  scale  parameter  A.  Thus  far,  Gx  (r)  has  been  interpreted  for  integer  r 
as  the  waiting  time  for  r  points  to  be  scored.  However,  the  progress  of  two  gamma 
stochastic  processes  Gx,  (r)  and  GX]  {?)  can  be  compared  for  any  value  of  r  >  0  sug¬ 
gesting  the  possibility  of  gamma  paired  comparison  models  with  non-integer  shape 
parameters. 

For  the  gamma  paired  comparison  model  with  shape  parameter  r,  the  prefer¬ 
ence  probability  is  given  by 


Pir) 


= P.W  <x,)=// 
0  0 


OO  *  j 


-II 


<" 1  exp  (-2,)  zr.~ 1  exp  (  Zj)  /  A, 


0  0 


T(r)r  (r) 


—dzidzj  =  0r(y;)- 


(1) 


The  final  notation  indicates  that  this  probability  depends  only  on  the  ratio  of  the 
scale  parameters  of  the  gamma  random  variables.  Since  the  probability  is  unchanged 
if  each  A,  is  multiplied  by  the  same  constant,  ^  A,  =  1  is  adopted  as  a  convention. 
By  reversing  the  roles  of  i  and  j,  the  natural  relationship  gT  ( A<  / A, )  =  1  —  gr  (A,  / A^ ) 
is  obtained.  The  preference  probability  is  increasing  in  the  ratio  A, /Ay,  and  for  A<  > 
Ay,  p[rI  is  increasing  in  r.  The  first  of  these  results  indicates  that  the  probability 
that  the  process  with  the  higher  rate  is  the  first  to  achieve  r  points  increases  as 
the  difference  between  the  rates  of  the  two  processes  becomes  larger.  This  is  easy 
to  verify  by  inspection  of  the  expression  (1).  The  second  result  implies  that,  if  i 
scores  points  faster  than  j,  then  comparing  the  processes  after  they  have  evolved 
for  a  long  time  favors  process  i.  If  we  take  7  =  A</Ay,  this  can  be  demonstrated  by 
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dp'  1  d2p'  1 

examining  a'*  and  g-~^  as  functions  of  7  and  r.  The  first  derivative  is  equal  to 

zero  at  7  =  1  and  tends  to  zero  as  7  —♦  00  for  any  r.  The  mixed  second  derivative  is 

positive  at  7  =  1  for  any  r,  so  the  first  derivative  is  positive  for  7  slightly  larger  than 

one.  The  second  derivative  remains  positive  until  some  critical  value  after  which  it 

op(r) 

is  always  negative.  Given  this  second  derivative  behavior,  g‘  must  be  positive 
for  all  A i  >  Ay.  It  follows  that  when  A <  <  Ay,  p\r)  decreases  as  r  increases. 

Following  David  (1988),  a  set  of  preference  probabilities  p,y,t  7^  j  are  said 
to  satisfy  a  linear  model  if  there  exist  real  numbers  , . . . ,  such  that  p,y  = 
H(vi  —  Vy)  for  H{‘)  monotone  increasing  from  H(— 00)  =  0  to  H(oo)  =  1  with 
H(x )  =  1  —  H{—x).  The  function  H(-)  is  the  cumulative  distribution  function 
(c.d.f.)  of  a  random  variable  symmetric  around  zero  and  is  called  the  defining 
distribution  of  the  linear  model.  The  parameters  v{  measure  the  positions  of  the 
k  teams  on  a  linear  scale.  A  linear  model  with  defining  distribution  H  is  called 
a  convolution  type  linear  model  if  H  can  be  derived  as  the  distribution  of  the 
difference  between  two  independent  random  variables  with  common  c.d.f.  F(-)  and 
different  location  parameters.  In  this  case  F  is  called  the  sensation  distribution  of 
the  convolution  type  linear  model.  The  Bradley- Terry  model  is  obtained  by  taking 
Vi  =  In  A,  and  t>y  =  In  Ay,  with  H(  x)  =  (I  +exp(— x))-1,  the  c.d.f.  of  the  logistic 
distribution  (Bradley  1953),  and  F(x)  =  exp(— e-*),  the  c.d.f.  of  the  extreme  value 
distribution  (Davidson  1969). 

It  turns  out  that,  for  any  r,  the  gamma  paired  comparison  model  can  be  ex¬ 
pressed  as  a  convolution  type  linear  model  where  the  density  of  the  sensation  dis¬ 
tribution  is 


fr[x) 


1 

r(r) 


e 


exp(-(*-»,)) 
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with  v{  =  In  Aj ,  and  the  density  of  the  corresponding  defining  distribution  is 


h  -  '  r(2r>  e'T‘ 

,{l  r(r)r(r)(l +«-)»'• 

Integrating  /ir(*)  and  evaluating  the  result  at  x  =  In  Aj  —  In  Ay  leads  to  the  pref¬ 
erence  probability  given  in  expression  (l).  As  r  — ■>  oo,  a  gamma  random  variable 
with  shape  parameter  r  tends  to  a  normally  distributed  random  variable.  Thus  the 
gamma  model  with  r  large  is  similar  to  a  convolution  type  linear  paired  compar¬ 
ison  model  with  Gaussian  sensation  distribution,  and  therefore  Gaussian  defining 
distribution.  The  Gaussian  linear  model  is  described  by  Thurstone  (1927)  and  re¬ 
fined  by  Mosteller  (1951).  For  small  values  of  r,  the  gamma  model  is  similar  to 
the  convolution  type  linear  model  whose  sensation  distribution  is  the  ordinary  ex¬ 
ponential  distribution  with  a  location  parameter  (Mosteller  1958,  Noether  1960). 
Formal  statements  describing  the  limiting  behavior  of  the  gamma  model  for  small 
or  large  r  can  be  found  in  Stern  (1987,  1990). 

For  the  remainder  of  this  article,  discussion  is  focused  on  gamma  paired  com¬ 
parison  models,  or  equivalently  the  subset  of  convolution  type  linear  models  that 
they  represent.  This  is  a  particularly  interesting  family  because  it  includes  the  most 
popular  approaches  to  paired  comparisons  experiments  in  a  single  family,  indexed 
by  the  single  parameter  r.  Naturally,  there  are  linear  models  that  are  not  convolu¬ 
tion  type  linear  models  (the  uniform  model  considered  by  Smith  (1956),  Mosteller 
(1958),  Noether  (I960))  and  other  convolution  type  linear  models  (for  example, 
those  with  the  Student’s  t-distribution  or  the  Cauchy  distribution  as  the  defining 
distribution)  that  are  not  considered  here.  Thus,  any  answer  to  the  question  posed 
by  the  title  of  the  article  is  incomplete.  Nonetheless,  the  evidence  indicates  that, 
within  the  class  of  linear  models,  all  models  are  essentially  equivalent.  In  order  to 
further  discuss  the  empirical  evidence,  we  consider  Latta’s  (1979)  partial  ordering 
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of  paired  comparison  models. 


3.  COMPOSITION  RULES  AND  A  PARTIAL  ORDERING  OF  MODELS 
As  described  earlier  pj'*  =  gr(Xi/X]  )  is  increasing  in  the  ratio  of  scale  parame¬ 
ters  and,  for  fixed  A<  <  Ay,  is  decreasing  in  r.  These  facts  are  illustrated  in  Figure  1 
which  shows  the  value  of  p|/’  for  r  between  0.01  and  100  when  A,-  <  Ay.  The  value 
of  p for  A i  >  Ay  is  obtained  from  gr (X{ /Ay)  =  1  —  gr(X}/Xi).  As  illustrated  in 
Figure  1,  different  ratios  of  the  A,  are  required  to  obtain  the  same  value  of  p.y  for 
different  values  of  r.  It  is  ordinarily  the  case  that  the  estimates  of  A,  ,  i  =  1, . . . ,  k, 
which  are  denoted  by  X< ,  t  =  1, . . . ,  k,  are  of  less  interest  than  the  fitted  preference 

A  A 

probabilities  p<y  =  gr  (A, /Ay).  For  example,  in  comparing  the  results  of  different 
paired  comparison  models,  indexed  by  different  values  of  r,  we  find  the  fitted  values 
to  be  the  relevant  means  of  comparison. 

A  property  of  all  linear  models  is  that  pik  can  be  computed  from  pi}  and  pjk. 
The  formula  for  this  computation,  called  the  triples  function  by  Yellott  (1977)  and 
the  composition  rule  by  Latta  (1979),  defines  a  function  G(-,  •)  such  that  pik  = 
G(pij,p]k).  For  the  gamma  model  with  shape  parameter  r,  the  compositon  rule 
can  be  expressed  in  terms  of  the  inverse  function  g~ 1  (p)  =  {7  :  gr{ 7)  =  p},  where 
7  is  a  ratio  of  scale  parameters.  The  inverse  is  well  defined  since  gr  (7)  is  monotone 
in  7.  The  composition  rule  for  the  gamma  model  with  shape  parameter  r  is 


p\l]  =  g(p';),p‘;))  =  9r{gr  1  (?;;’)  g  r  Mp';1)}. 


(2) 


As  an  illustration  consider  the  Bradley- Terry  model,  where  gT{ 7)  =  7/(7  +  1)  and 
g~x{p)  =  p/(  1  —  p).  If  p*y*  =  0.6  and  p*.^  =  0.8,  then  A</Ay  =  1.5  and  Ay/Afc  =  4, 
from  which  A*  / Afc  =6  and  p*^1  =  0.857.  The  composition  rule  is  illustrated  in  Table 
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1  for  a  variety  of  values  of  p^  ,pyfc  and  r.  Table  1  also  includes  the  composition  rule 
for  convolution  type  linear  models  with  the  normal  sensation  distribution  and  the 
exponential  sensation  distribution.  The  values  shown  in  Table  1  are  for  the  section 
of  the  unit  square  in  which  pik  >  1/2,  (1  —  p}k)  <  P.y  <  Pjk •  Values  of  the 

composition  rule  ptJ  ,  pjk  in  other  sections  of  the  unit  square  (e.g.  p,y  =  0.6 ,p}k  = 
0.1)  can  be  obtained  by  applying  the  following  properties  of  the  composition  rule 
G  (Latta  1979): 

(i)  G(Pii^)=Pa 

(ii)  G(pij ,1  Pij)  =  j 

(iii)  G{Pij,P]k)  =  G{Pjk>Pij)  (symmetry) 

(iv)  G(piy  ,Pyfc)  =  1  -  G(1  -  p^,  1  -  pjk) 

(v)  pik  =  G(pij,pjk)  <=>  p^  =  G(pik,l  -pjk)  <=*  Pjk  =  G(1  -Pij,pik). 
These  properties  are  easy  to  verify  for  the  gamma  models.  Consider  property  (iii) 
which  is  proved  by  a  series  of  equalities  using  g~ 1  (p)  =  \/g~ 1  (1  -  p)  and  gr  (7)  = 
1  ~9r  (1/7)1 

G{Pij,Pjk)  =  9r  {g; 1  [Pij)  g; 1  (py*))  =  9r  (  1  -  \  -w:1  -~t) 

9r  (1  Pij)  9r  (1  Pjk) 

=  1  -  9r  {9; 1  (1  -  P»y )  97  1  (1  ~  Pjk)) 

=  1  -  G(1  -p,y,l  -Pyfc). 

Table  1  indicates  that  there  is  not  much  change  in  the  value  of  p,*  obtained  for  fixed 
Pij  and  Pjk  as  r  varies  from  0.01  to  100.  The  limiting  behavior  of  the  gamma  models 
for  small  and  large  r  is  also  demonstrated  in  Table  1.  Burke  and  Zinnes  (1965)  found 
that  the  composition  rules  of  the  Bradley-Terry  and  Thurstone-Mosteller  models 
are  quite  similar.  This  result  is  also  demonstrated  in  Table  1. 

Latta  (1979)  introduces  a  partial  ordering  on  paired  comparison  models.  The 
paired  comparison  model  A  is  more  extreme  than  the  paired  comparison  model  B  if 
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for  all  {Pij iPjk)  €  {(0.5, 1.0)  x  (0.5, 1.0)}  p‘*  ’  >  pjf 1  with  strict  inequality  for  some 
pair.  As  before,  the  definition  is  given  in  terms  of  one  quadrant  of  the  unit  square, 
since  the  definition  is  extended  to  the  remainder  of  the  unit  square  via  properties 

(i)-(v)  above.  Latta  gives  an  algebraic  proof  that  the  Thurstone-Mosteller  model 
is  more  extreme  than  the  Bradley- Terry  model  and  proves  the  following  theorem 
that  gives  a  sufficient  condition  for  determining  whether  one  linear  model  is  more 
extreme  than  a  second  in  terms  of  the  densities  of  the  defining  distributions. 

Theorem  (Latta  1979  p.369):  Suppose  that 

(A)  ha  and  hb  are  densities  whose  associated  c.d.f.’s,  Ha  and  Hb,  satisfy  the  two 
conditions  (1)  H(x)  =  1  —  H(—x)  and  (2)  H~ 1  (p)  exists  for  p  6  (0,  l). 

(B)  for  every  c  >  0  there  exists  Nx  (c)  >  N2  (c)  >  0  such  that 

(i)  \t\  <  N2(c)  =»  hb(t)  <  cka(ct) 

(ii)  N2(c)  <  |t|  <  Nx(c)  =>  hb(t)  >  cha(ct) 

(iii)  \t\  >  Nx(c)  =>  hb(t)  <  cha(ct). 

Then  the  linear  model  based  on  hb  is  more  extreme  than  the  linear  model  based  on 
tta . 

The  following  proposition  applies  this  theorem  to  gamma  paired  comparison 
models. 


Proposition  1.  If  rx  >  r2  then  the  gamma,  paired  comparison  model  with  shape 
parameter  rx  is  more  extreme  than  the  gamma  paired  comparison  model  with  shape 
parameter  r2 . 


Proof.  The  result  is  demonstrated  by  showing  that  the  conditions  in  Latta’s  the¬ 


orem  are  satisfied  by  the  densities 

,  ,  ,  r(2r, )  • 

n''W-r(r1)r(r1)  (!  +  «-)’'• 


and 


K  (*)  = 


T  (2r2 )  e~T,x 
T(r2)T(r2)  (1  +  c'*)2^  ' 
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The  densities  satisfy  the  conditions  in  (A)  and  therefore  we  consider  the  ratio 


_  h„ (t)  lrwrwrj^ 

ch„(ct)  c  r(2r2)r(r1)rfr,)22''  l' 
where  cosh(x)  =  ( ex  +  e"*)/2,  and  the  derivative  of  the  ratio 

-  rx  tanh  ^), 


dR  .  ct 

—  =  R(t)(cr2  tanh  — 


(cosh  ~^)2r*  i 


where  tanh(x)  =  (e*  —  e~x)/(ex  +e  *).  As  the  densities  hr, ,  hrj  and  the  ratio  R{t) 
are  symmetric  we  consider  only  t  >  0.  Form  the  function 


k(r) 


r(2  r) 

r(r)r(r)22r 


from  the  coefficient  of  hr(x).  Then  it  can  be  shown,  using  formulas  for  T(r)  and 
T'(r)  from  Chapter  6  of  Abramowitz  and  Stegun  (1964),  that  k(r)  is  increasing  in 
r  and  k(r)/r  is  decreasing  in  r.  The  conditions  (B)  of  the  theorem  are  verified  by 
considering  c  in  three  regions,  c  <  1,  c  >  r1/r2  and  the  intermediate  range. 

For  c  <  1,  dR/dt  —  0  for  t  =  0  and  dR/dt  <  0  for  t  >  0.  Also,  i2(0)  >  1 
since  rx  >  r2,  c  <  1,  and  A:(r)  is  increasing  in  r,  and  R(t)  is  less  than  1  as  t  — ►  oo. 
Thus,  hTl  starts  above  hr7 ,  the  densities  cross  once  and  then  hri  remains  below  hTj 
after  the  crossing.  The  conditions  of  the  theorem  are  satisfied  with  N2  (c)  =  0.  In  a 
similar  manner,  we  find  that  when  c  >  rljr2,  dR/dt  >  0  for  t  >  0,  J2(0)  <  1  (since 
k(r)/r  is  decreasing  in  r),  and  R(t)  is  greater  than  1  as  t  — *  oo.  The  conditions  of 
the  theorem  are  satisfied  with  Ar1  (c)  =  oo. 

For  intermediate  values  of  c,  R( 0)  may  be  greater  than  one,  less  than  one  or 
equal  to  one.  However,  the  derivative  has  at  most  one  change  of  sign,  as  can  be 
verified  by  showing  that  the  ratio  (cr2  tanh  cx)  /  (rx  tanh  x)  is  monotone  decreasing. 
It  turns  out  that  for  c  <  y/rl/r2  there  are  no  changes  of  sign  of  the  derivative 
and  for  c  >  \ft\fr~2  the  derivative  is  initially  positive  and  becomes  negative.  If 
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*(0)  <  1  then  R(t)  increases  initially  and  then  decreases  below  one  and  remains 
there  as  t  — ►  oo,  whereas  if  J?(0)  >  1  then  R(t)  may  decrease  or  increase  initially 
but  eventually  ends  below  one.  In  either  case,  the  conditions  of  the  theorem  hold, 
as  the  densities  intersect  at  most  twice  (equivalently  the  ratio  R(t)  is  equal  to  one 
for  at  most  two  values  of  t).  Thus,  the  conditions  of  the  theorem  are  verified  for  all 
values  of  c  >  0.  • 

This  section  and  the  preceding  section  focus  attention  on  a  subset  of  the  convo¬ 
lution  type  linear  models  for  paired  comparisons  experiments.  The  gamma  paired 
comparison  models  include  the  most  popular  paired  comparison  models  and  are 
ordered  by  the  extremeness  of  their  composition  rules.  After  briefly  discussing  in¬ 
ference  for  paired  comparisons  experiments,  the  empirical  phenomenon  that  many 
models  provide  similar  fits  to  a  data  set  is  examined  by  considering  models  that  are 
extreme  points  in  the  family  of  gamma  models. 


4.  INFERENCE 

In  the  paired  comparisons  experiment  with  k  objects,  i  and  j  are  compared 
Uij  =  riji  times,  with  i  preferred  to  j  in  a,y  of  the  comparisons.  No  ties  are 
permitted.  If  successive  comparisons  are  independent,  then  a,y  is  a  binomial  random 
variable  with  n i}-  trials  and  the  probability  of  a  success  on  any  trial  is  gr  (A< /Ay). 
Finally,  if  comparisons  among  different  pairs  of  objects  are  independent  then  the 
likelihood  for  the  entire  data  set  is  the  product  of  (*)  binomial  likelihoods.  For  fixed 
r,  the  maximum  likelihood  estimates  of  the  scale  parameters  A<  are  obtained  using  a 
combination  of  Newton-Raphson  and  steepest  descent  steps.  This  approach  works 
well  except  for  small  values  of  r,  where  an  iterative  approach  (Ford  1957,  Stern 
1987)  is  required  until  the  solution  is  nearby.  The  likelihood  can  not  be  maximized 
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if  one  object  is  always  preferred  to  its  competitors  or  if  one  object  is  never  preferred 
to  its  competitors.  To  maximize  the  likelihood  over  r,  the  likelihood  is  evaluated 
for  a  grid  of  r  values.  This  is  more  straightforward  than  directly  incorporating  r 
into  the  Newton-Raphson/steepest  descent  maximization. 

To  assess  goodness  of  fit,  consider  the  likelihood  ratio  test  for  the  null  hypoth¬ 
esis  that  the  gamma  model  with  shape  parameter  r  (viewed  as  being  fixed  for  the 
purposes  of  this  discussion)  is  adequate  versus  the  alternative  hypothesis  that  max¬ 
imizes  each  binomial  likelihood  separately.  In  the  latter  case,  pi3  is  estimated  by 
dij/riij ,  while  in  the  former  pi3  is  estimated  by  gr(A</Ay).  The  alternative  hypoth¬ 
esis  might  be  preferred  if  the  data  contains  many  inconsistent  triads  of  the  form 
Pi,  >  0.5,  Pjk  >  0-5,  Pki  >  0.5.  These  triads  are  not  consistent  with  the  property  of 
strong  stochastic  transitivity  (pi3 , p3k  >1/2  implies  pik  >  max(p0  ,pjfc))  (David 
1988)  that  is  implicitly  assumed  by  all  convolution  type  linear  models.  The  usual 
test  statistic  for  the  above  hypothesis,  which  we  use  as  a  measure  of  goodness  of 
fit,  is 


Q  i 


k 

=  2EEflfi1°8 

»'=  1  i*i 


g«y  lnij 

9r{  A</Ay) 


If  the  gamma  model  is  correct  and  the  ni3  are  large,  then  Q j  has  the  chi-square 
distribution  with  the  number  of  degrees  of  freedom  equal  to  the  difference  between 
the  number  of  free  parameters  in  the  two  likelihoods,  \{k  —  1  )k  —  (k  —  1)  =  ^(k  — 
l)(A:  —  2).  In  practice,  r  is  estimated  and  should  be  treated  as  a  parameter  for 
purposes  of  the  goodness  of  fit  test.  However,  models  with  different  values  of  r  are 
considered  as  different  models  in  the  following  section  and  then  compared  to  each 
other.  Therefore  r  is  treated  as  fixed  in  the  next  section.  Notice  that  the  usual 
likelihood  ratio  procedure  can  not  be  used  to  test  whether  one  gamma  model  is 
superior  to  another  since  the  models  are  not  nested.  Qi  is  used  to  compare  the  fit 
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of  the  models  in  the  following  section. 


5.  ARE  ALL  LINEAR  MODELS  THE  SAME? 

Consider  the  1989  American  League  baseball  season  as  a  paired  comparisons 
experiment  to  determine  the  relative  ability  of  the  fourteen  teams.  In  the  following 
matrix  A,  each  team  is  represented  by  one  row  and  column.  The  entries  in  row  t, 


ai},  correspond  to  the  number  of  wins  for  team  i  in  contests  with  team  j. 
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The  fourteen  teams  in  the  American  League  are  divided  into  two  seven  team  di¬ 
visions.  The  top  seven  rows  represent  the  teams  in  one  division  and  the  bottom 
seven  rows  represent  the  teams  in  the  other  division.  Teams  play  13  games  against 
each  team  in  their  division  and  12  games  against  each  team  in  the  other  division. 
No  ties  are  possible.  One  game,  between  team  5  and  team  14,  was  cancelled  due  to 
inclement  weather. 

We  consider  the  fit  obtained  by  applying  gamma  paired  comparison  models 
to  the  results  of  baseball  games  even  though  the  point  scoring  process  in  baseball 
is  not  similar  to  a  Poisson  process.  The  maximum  likelihood  estimates  for  gamma 
models  with  r  ranging  from  0.1  to  50  were  obtained,  and  the  goodness  of  fit  statistic 
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Qx  computed  for  each  model.  The  values  of  Q i  range  from  81.47  for  r  =  0.1  to 
81.19  for  r  =  50.  The  Bradley- Terry  model  has  Qx  =  81.22.  The  maximum  of  the 
likelihood,  equivalent  to  the  minimum  value  of  Qi ,  over  the  values  of  r  considered 
here  is  obtained  at  r  =  50  (approximately  the  Thurstone-Mosteller)  model.  On  the 
one  hand,  we  have  the  positive  result  that  the  gamma  models  provide  an  adequate 
fit  to  the  data  (values  of  Qx  should  be  compared  to  the  chi-square  distribution  with 
78  degrees  of  freedom  in  this  case).  However,  the  variation  among  models  is  so  small 
that  no  model  is  obviously  preferred  to  the  others.  If  r  is  viewed  as  a  parameter 
of  the  model  and  f  indicates  the  value  of  r  that  maximizes  the  likelihood,  then  an 
asymptotic  95%  confidence  interval  for  r  includes  all  values  of  r  such  that  Qx  within 
3.84  (the  upper  5%  point  of  the  chi-square  distribution  on  one  degree  of  freedom) 
of  the  minimum  value  of  Qx .  For  this  data  set  the  confidence  interval  contains  all 
values  of  r  between  0  and  50  (larger  r  were  not  considered).  The  largest  difference 
between  the  residuals  of  one  gamma  model  (the  difference  between  the  matrix  A 
and  the  fitted  values  obtained  by  a  given  model)  and  the  residuals  of  a  second  is 
0.17.  The  magnitude  of  the  residuals  range  from  0  to  4.79  so  that  the  variation 
among  models  is  much  smaller  than  one  might  expect.  Large  residuals  typically 
correspond  to  extreme  results,  pairs  in  which  one  team  dominates  another  despite 
the  fact  that  each  team  won  at  least  35%  of  their  games  overall.  The  results  for 
the  1989  American  League  season  as  well  as  nine  other  baseball  data  sets  and  five 
recent  basketball  seasons  (teams  play  each  other  between  2  and  5  times)  are  given 
in  Table  2.  The  results  from  five  football  seasons,  in  which  teams  play  each  other 
0,  1  or  2  times,  are  also  given.  The  chi-square  approximation  is  inappropriate  for 
the  football  data  due  to  the  small  sample  sizes.  However,  the  similarity  of  the  fit 
provided  by  different  values  of  r  is  striking.  In  each  case  but  one,  the  values  of  Qx 
are  either  monotone  increasing  or  monotone  decreasing  indicating  that  the  “best” 
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model  is  obtained  by  using  the  largest  or  smallest  value  of  r.  The  results  of  the 
sports  data  sets  reinforce  the  earlier  results  of  Mosteller  (1958)  and  Jackson  and 
Fleckenstein  (1957). 

To  investigate  more  thoroughly  why  this  occurs,  some  calculations  for  artificial 
data  are  considered.  Consider  data  that  is  generated  from  the  gamma  model  with 
shape  parameter  r  =  0.1,  =  0.9,  pik  =  0.9,  and  as  indicated  by  the  composition 

rule,  pik  =  0.9803.  Initially  assume  that  100  comparisons  of  each  pair  are  carried 
out,  with  results  exactly  matching  the  model,  i.e.  t  is  preferred  to  j  in  90  out  of  100 
comparisons,  j  is  preferred  to  k  in  90  out  of  100  comparisons,  and,  to  be  precise,  i 
is  preferred  to  k  in  98.03  comparisons.  This  represents  a  data  set  with  no  sampling 
variability.  Gamma  models  with  other  values  of  r  can  be  fit  to  this  “observed”  data, 
equivalent  to  misspecifying  the  model.  Naturally,  r  =  0.1  provides  a  perfect  fit  to 
the  data,  Qx  =  0.  Even  the  most  extreme  model  considered,  r  =  50,  has  a  small 
value  of  the  goodness  of  fit  statistic  Qx  =  1.58.  Recall  that,  for  an  experiment  with 
3  objects,  when  testing  a  particular  gamma  model  against  the  alternative  that  each 
Pi,  is  estimated  separately,  Qx  can  be  compared  to  the  chi-square  distribution  on 
1  degree  of  freedom.  Thus  100  comparisons  per  pair  are  not  sufficient  to  reject  the 
r  =  50  model  when  the  data  is  generated  by  the  r  =  0.1  model  with  no  error  or 
variability.  Noether  (1960)  applied  the  same  approach  using  an  alternative  measure 
of  fit.  Using  Qx  enables  us  to  determine  the  sample  size  required  to  distinguish 
between  models.  At  usual  significance  levels,  250  observations  of  each  pair  are 
required  to  reject  the  r  =  50  model  as  inadequate  (compared  to  the  saturated 
model)  when  the  data  is  generated  by  the  r  =  0.1  model.  The  same  analysis  was 
repeated  for  a  variety  of  and  pjk  values,  specifically,  a  grid  where  pi}  and  p,k  were 
multiples  of  0.05.  The  result  described  above  is  the  scenario  for  which  the  models 
differed  by  the  largest  amount.  In  other  cases  500,  1000  or  more  comparisons  of 
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each  pair  are  required  to  distinquish  the  r  =  0.1  model  from  the  r  =  50  model.  Even 
larger  sample  sizes  are  required  to  distinguish  the  Bradley-Terry  model  (r  =  1)  from 
other  gamma  models. 

The  previous  analysis  and  that  of  Noether  (1960)  ignore  the  variability  that 
occurs  in  samples.  If  random  paired  comparisons  experiments  are  simulated  in 
which  100  comparisons  of  each  of  the  three  pairs  of  objects,  i  versus  j,  i  versus 
Ac,  and  j  versus  fc,  are  generated,  then  the  results  are  similar.  For  the  example 
discussed  in  the  preceding  paragraph,  the  average  goodness  of  fit  statistic  over  1000 
replications  for  the  model  that  generated  the  data  (r  =  0.1),  was  1.133  and  the 
standard  deviation  of  the  statistics  was  1.534  (consistent  with  the  null  distribution, 
chi  square  on  one  degree  of  freedom) .  The  average  goodness  of  fit  statistic  for  the 
r  =  50  model  is  2.619  and  the  standard  deviation  is  3.018.  The  average  difference 
between  the  two  models  is  1.486,  slightly  smaller  than  the  result  obtained  from  data 
with  no  variability.  The  r  =  50  model  provides  a  better  fit  than  the  model  that 
generated  the  data  in  31%  of  the  samples.  Simulations  for  five  objects  indicate  again 
that  several  hundred  comparisons  of  each  pair  are  required  to  distinguish  between 
models.  The  required  sample  size  is  smallest  in  those  data  sets  for  which  some  of 
the  Pi,  are  extreme. 

For  experiments  with  fewer  comparisons  of  each  pair,  the  extreme  probabilities 
used  above  frequently  produce  simulated  data  sets  such  that  i  is  always  preferred 
to  j  and  k.  Maximum  likelihood  estimates  can  not  be  obtained  for  such  data  sets. 
Simulations  were  carried  out  using  less  extreme  values  of  pfJ  ,  pjle,  pile.  Consider 
1000  simulated  data  sets  consisting  of  20  comparisons  of  each  pair  of  three  objects 
with  r  =  0.1,  pa  —  0.6,  pik  =  0.9,  pik  —  0.9210.  The  average  difference  between 
the  goodness  of  fit  statistic  for  r  =  0,1  and  the  goodness  of  fit  statistic  for  r  =  50  is 
0.205.  The  incorrect  model,  r  =  50,  is  preferred  for  43%  of  the  data  sets.  It  is  more 
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difficult  to  distinguish  between  the  models  in  this  case  due  to  the  decreased  sample 
size  (number  of  comparisons)  and  the  less  extreme  preference  probabilities. 


6.  DISCUSSION 

The  sports  data  sets  and  simulations  seem  to  answer  Mosteller’s  (1958  pg  284) 
call  to  “explore  the  sensitivity  of  the  method  of  paired  comparisons  to  the  shape  of 
the  curve  used  to  grade  the  responses” .  The  gamma  models  provide  a  convenient 
family  of  models  indexed  by  a  single  parameter  that  can  be  used  to  explore  the 
question.  By  comparing  models  at  extreme  values  of  the  shape  parameter,  the 
Thurstone-Mosteller  model  (r  large)  and  the  exponential  model  (r  near  zero),  over  a 
wide  range  of  data  sets  and  simulation  scenarios,  we  find  that  the  paired  comparisons 
analysis  is  not  very  sensitive  to  the  choice  of  distribution  within  the  class  of  linear 
models.  Moreover,  in  experiments  with  three  objects,  it  appears  that  at  least  250 
comparisons  of  each  pair  of  objects  are  required  to  distinguish  between  models  using 
a  goodness  of  fit  test  statistic.  The  work  of  Mosteller  (1958)  and  Noether  (1960) 
shows  that  the  linear  model  defined  by  the  uniform  distribution  (not  part  of  the 
gamma  models  but  more  extreme  than  even  the  Thurstone-Mosteller  model)  also 
provides  a  similar  fit. 

In  part,  this  result  seems  to  be  an  example  of  the  similarity  of  many  distribu¬ 
tions  at  the  center  of  the  distribution  (see  Cox  1970  for  more  details).  The  similarity 
between  the  fits  obtained  with  the  Bradley- Terry  and  Thurstone-Mosteller  models 
is  not  surprising  given  the  similarity  of  the  logistic  and  normal  distribution  func¬ 
tions.  The  linearity  assumption  of  the  paired  comparison  models  is  also  a  part  of  the 
explanation.  This  assumption  leads  us  to  only  consider  strongly  transitive  models 
as  the  k  objects  are  assumed  to  be  rank  ordered  on  a  linear  scale.  The  particular 
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distribution  used  to  fit  the  linear  model  does  not  seem  to  be  as  important  as  the 
determination  of  whether  a  linear  model  is  appropriate. 

Some  data  sets  will  be  consistent  with  simpler  models,  for  example  the  objects 
may  be  organized  as  groups  of  similar  objects.  Then  a  linear  model  with  some 
parameters  set  equal  to  each  other  will  be  sufficient.  In  other  cases,  those  with 
inconsistencies  for  instance,  a  model  that  assigns  one  parameter  per  object  will  not 
be  sufficient.  This  leads  to  more  sophisticated  models  (Davidson  and  Bradley  1969, 
Hiyashi  1964,  Marley  1988)  that  allow  objects  to  be  compared  on  one  of  several 
possible  dimensions.  Item  t  might  be  preferred  to  item  j  on  one  dimension  but 
j  might  be  preferred  on  other  dimensions.  The  outcome  of  a  paired  comparison 
depends  on  which  dimension(s)  are  used  to  compare  the  objects.  The  nature  of 
the  comparison  experiment  must  dictate  which  model  is  appropriate.  The  compre¬ 
hensive  study  here  suggests  that  if  a  linear  model  is  selected,  the  particular  linear 
model  does  not  have  a  large  effect  on  the  analysis  for  the  usual  sample  sizes. 

The  similarity  of  fits  among  the  linear  models  seems  to  also  hold  in  experi¬ 
ments  in  which  more  than  two  objects  are  compared  at  a  time.  The  order  statistics 
ranking  models  described  by  Critchlow,  Fligner  and  Verducci  (1990)  are  the  natural 
extension  of  the  linear  models  to  such  experiments.  Simulations  like  those  described 
here  indicate  that  the  fit  obtained  by  order  statistics  models  is  not  sensitive  to  the 
distribution  used. 
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preference  probability 


Figure  1 


in 


ratio  of  scale  parameters 


Figure  1.  Preference  probabilities  in  the  gamma  paired  comparison  model  as  a 
function  of  the  ratio  of  the  scale  parameters  A,  /Ay . 
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Table  1.  Value  of  pik  Obtained  for  Different  Gamma  Models 


Pi, 

Pik 

Exponential 

0.2 

0.9 

.75000 

0.3 

0.8 

.66667 

0.3 

0.9 

.83333 

0.4 

0.7 

.62500 

0.4 

0.8 

.75000 

0.4 

0.9 

.87500 

0.6 

0.6 

.68000 

0.6 

0.7 

.76000 

0.6 

0.8 

.84000 

0.6 

0.9 

.92000 

0.7 

0.7 

.82000 

0.7 

0.8 

.88000 

0.7 

0.0 

.94000 

0.8 

0.8 

.92000 

0.8 

0.9 

.96000 

0.9 

0.9 

.98000 

p<r’ 

pr 

pi;1 

.74996 

.74639 

.69231 

.66661 

.66198 

.63158 

.83331 

.83094 

.79412 

.62494 

.62055 

.60870 

.74996 

.74681 

.72727 

.87498 

.87340 

.85714 

.68005 

.68356 

.69231 

.76004 

.76301 

.77778 

.84003 

.84202 

.85714 

.92001 

.92101 

.93103 

.82003 

.82253 

.84483 

.88002 

.88170 

.90323 

.94001 

.94085 

.95455 

.92001 

.92114 

.94118 

.96001 

.96057 

.97297 

.98000 

.98029 

.98780 

pir 

(10°) 

Pik 

Normal 

.67218 

.67022 

.67001 

.62513 

.62453 

.62446 

.77743 

.77571 

.77552 

.60700 

.60684 

.60682 

.72236 

.72188 

.72183 

.84905 

.84817 

.84807 

.69367 

.69380 

.69382 

.78126 

.78160 

.78164 

.86260 

.86317 

.86323 

.93684 

.93752 

.93760 

.85203 

.85278 

.85287 

.91286 

.91392 

.91403 

.96336 

.96442 

.96454 

.95240 

.95369 

.95384 

.98194 

.98301 

.98313 

.99402 

.99473 

.99481 
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Table  2.  Comparing  Models  on  Sports  Data  Sets 


League  and  Season 

Teams 

r  =  0.1 

r  =  1.0 

r  =  10.0 

r  =  50.0 

1989  American  League  Baseball 

14 

81.4704 

81.2221 

81.1914 

81.1888 

1986  American  League  Baseball 

14 

73.6317 

73.6597 

73.6610 

73.6611 

1985  American  League  Baseball 

14 

89.0463 

89.1551 

89.1786 

89.1806 

1984  American  League  Baseball 

14 

86.8979 

86.8070 

86.7829 

86.7809 

1983  American  League  Baseball 

14 

58.5468 

58.5953 

58.5932 

58.5929 

1989  National  League  Baseball 

12 

51.5812 

51.5357 

51.5314 

51.5310 

1986  National  League  Baseball 

12 

50.5396 

50.0067 

49.9121 

49.9039 

1985  National  League  Baseball 

12 

56.7934 

56.6084 

56.5811 

56.5788 

1984  National  League  Baseball 

12 

53.3042 

53.4228 

53.4357 

53.4368 

1983  National  League  Baseball 

12 

64.7119 

64.7488 

64.7516 

64.7518 

1981  National  Basketball  Assoc. 

23 

238.843 

239.257 

239.671 

239.715 

1980  National  Basketball  Assoc. 

22 

210.316 

208.117 

207.578 

207.532 

1979  National  Basketball  Assoc. 

22 

224.593 

223.633 

223.447 

223.431 

1978  National  Basketball  Assoc. 

22 

181.805 

181.713 

181.730 

181.731 

1977  National  Basketball  Assoc. 

22 

222.933 

223.512 

223.613 

223.622 

1986  National  Football  League 

28 

152.877 

153.056 

152.766 

152.728 

1985  National  Football  League 

28 

169.866 

169.782 

169.386 

169.343 

1984  National  Football  League 

28 

156.969 

156.769 

156.402 

156.358 

1983  National  Football  League 

28 

186.809 

186.660 

186.482 

186.461 

1981  National  Football  League 

28 

192.906 

194.526 

194.694 

194.698 
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ABSTRACT 


Previous  authors  (Jackson  and  Fleckenstein  1957,  Mosteller  1958,  Noether  1960)  have 
found  that  different  models  of  paired  comparisons  data  lead  to  similar  fits.  This  phe¬ 
nomenon  is  examined  by  means  of  a  set  of  paired  comparison  models,  based  on  gamma,  ran¬ 
dom  variables,  that  includes  the  frequently  applied  Bradley- Terry  and  Thurstone-Mosteller 
models.  A  theoretical  result  provides  a  natural  ordering  of  the  models  in  the  gamma  fam¬ 
ily  on  the  basis  of  their  composition  rules.  Analysis  of  several  sports  data  sets  indicates 
that  all  of  the  paired  comparison  models  in  the  family  provide  adequate,  and  almost  iden¬ 
tical,  fits  to  the  data.  Simulations  are  used  to  further  explore  this  result.  Although  not 
all  approaches  to  paired  comparisons  experiments  are  covered  by  this  discussion,  the  evi¬ 
dence  is  strong  that  for  samples  of  the  size  usually  encountered  in  practice  all  linear  paired 
comparison  models  are  virtually  equivalent. 
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